Wide  Area  Information  Servers: 
A  Supercomputer  on  every  Desk 


Brewster  Kahle 
Thinking  Machines  Corporation 


Thinking  Machines  Corporation 


What  I  really  want... 

My  personal  information  to  be  accessible 
Published  information  should  find  me 
Usable  anywhere 

Others  can  use  what  I  have  learned 
(if  I  want  them  to) 
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New  Communications  Technology  Problems 


BOOKS 

Telearaoh> 

Telephone 

Electronic 

■  ^mr^mr  m  ■  ^mW  Sam 

Publishing 

Experts  only 

Monks 

Operators 

Professional 
searchers 

Distribution  is  hard 
and  expensive 

Vellum  is 
calfskin 

Telephones  on 
barb  wire 

$1 /minute  over 
obscure  modems 

Different  interfaces 

1000  s  of  languages 
in  Eurooe  alone 

Switching  was 
manual 

//query  (W5) 
inform? 

m  m  m  m  ^m&  m  m  m  m  m 

Material  is 
1  intractable 

Scrolls  and  manu- 

scriote  were  shout 

ovl  11/10  irvic  ClWUt 

as  random  access 
as  musical  scores 

No  white  pages 

600  databases 
on  Dialoa 
~1  Terabyte 
140Gbyte  at  DJ 
80GB  card  catalog 
atRLG 

Business  model 
needed 

Centralized  printing 

Pay  per 
minute 

Not  understood 
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Navigation  Techniques:  Paper 

Alphabetical  Listings  (dictionary,  Encyclopedia) 

Indices  (back  of  the  book  and  Readers  Guide) 

Table  of  Contents  (outlining) 

Citation  index 

"Tree  of  Knowledge" 

Have  you  read  any  good  books  lately? 
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Navigation  Techniques:  Computers 

•  Hierarchical  File  Systems 

•  Unix  "find"  and  "grep",  Mac  "find  file" 

•  Gopher,  Magellan,  ON  Location 

•  Boolean  query  systems  (...within  5  words  of...) 

•  Static  Hypertext  links  (see  also  pointers) 
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Navigation  Techniques:  WAIS 


English  language  questions  and  relevance 
feedback 

-  Question-answer  dialog 

-  Similar  to  Newspapers:  "More  on  page  5" 

-  Dynamic  Hypertext  Links 

2  level  search: 

-  Directory  of  servers  (server  like  any  other) 

-  Servers  themselves 
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Wide  Area  Information  Server 

Architecture 


Z39.50 
over 
LAN 


Dow  Jones 


Directory  of 
Servers 


Gateways 
to  other  nets 


Z39.50  over 

X.25,  TCP/IP  j  Modem 
Open  Interconnection 
Public  Protocol 


TV  Guide 
etc. 


Image  I  Private  B|l 
Servers  I  Servers  W» 


Users  Needs: 

Selecting  Servers 
Answering  Questions 
Organizing  Responses 


Architecture  Issues: 
Scalability 
Security 

Business  model  for  servers 
Reliable  Access 
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Demonstration  System  Structure 


WAIS 


Operations: 

Archiving 

Queries 

Retrieval 
IR  Type: 

Broadcast 

Query  by  Example 
Databases: 

Wall  St  Journal 

Barron's 

400  Business  Mags 


LAN 


x.25 
Z39.50 
9600Baud 


Connection 
Machine 


CM: 

Operations: 

Queries 
IR  Type: 

enhanced  relevance 

feedback 
Databases: 

DowVision  and  memo's, 

mail,  word  processor  files 


Workstations 


Mac: 

Operations: 
Human  Int 
Retrieval 
Queries 

"Caching"  Docs 

User  Profiles 
IR  Type: 

Query  by  example 
Databases: 

Personal  Text 

Cached  data 
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WAIS  Hardware  Components 
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WAIS  Clients 

•  Busy  24  hours  a  day  finding  information 

•  Ponder  all  indications  of  the  preferences  of  its  user 

•  Gossip  with  other  clients  about  their  discoveries 

•  Scours  the  world  (within  a  budget)  to  find 
new  sources 

•  Current  implementations  on  PC,  Macintosh, 
X  Windows,  NeXT,  dumb  terminal  (dial-up) 
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The  WAIS  Protocol  is  WAIS 


Supports  any  search  syntax 

Supports  sophisticated  clients  — 
puts  intelligence  in  the  user's  hands 

Clients  can  run  on  any  platform 

Multiple  servers  in  a  single  search 

Retrieve  any  kind  of  data:  text,  graphics,  video,... 
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WAIS  Protocol 

Based  on  NISO  Z39.50  international  standard 

Flexible  —  separates  clients  from  servers 

Search:  (words,  docjds,  databases) 

returns  list  of:  (headline,  score,  docjd,  types) 

Retrieval:  (docjd,  type,  start,  end) 
returns:  data  of  specified  type 

Docjd:  An  ISBN  for  the  Electronic  Age 

Server  Description  Structure  for  the  Directory 
of  Servers 
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How  Standard  Protocol 
can  Provides  Security 

•  Users  do  not  login  to  server,  but  search  only 
through  application  layer  protocol  (Z39.50). 

•  Server  controls  access  to  data. 

•  Network  layers  below  application,  or  application 
layer  handles  authentication,  encryption,  billing. 
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Connection  Machine  Server 


1-100GBytes  (and  getting  bigger) 
Supports  thousands  of  users 

Automatic  Indexing 

Uses  words  and  phrases  in  question  to  find 
appropriate  documents  with  relevance  feedback, 

weighted  term 

Supports  Boolean  Queries 

Cost  effective  hardware  alternative  to  mainframes 
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Data  Parallelism: 

Searching  all  the  Documents  at  Once 


Pharmaceutical  +12 
FDA +9 
Medical  +6 


Stadium 

Thinking 


Boolean  Search 


Retrieve  documents  containing 
specific  combinations  of  words 


Conceptual  Search 


Explore  a  set  of  documents 
containing  related  concepts 
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Boolean  Query 

Hard  to  Use: 
Complex  Syntax 


(Japanese  OR  Japan)  AND 

(building  OR  buildings  OR  (Real  AND  Estate)  AND 
(Manhattan  OR  (New  AND  York) 


Poor  Results: 

The  wrong  information 

No  ranking  of  results 


Have  you  been  paying  attention?... 
Freer  Finance:  U.S.  Regulators  Move... 
REAL  ESTATE:  California  Initiatives... 
First  Boston  Said  To  Agree  on  Sale  Of... 
Exxon,  Rockefeller  Group  to  Sell  Site... 
What's  News- Business  and  Finance 
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Query  Broadcast  To  Database 

on  Connection  Machine  System 


Tripoli 


Libyan 


PLO 


bomb 
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Document  Retrieval  Performance 


•  Current  algorithm  limits: 

~2GBwith  512MBCM-2 
-8  GB  with  2  GB  CM-2 
-25  GB  with  8  GB  CM-2 

•  High  recall  Stanfill  and  Kahle, 

>   see  Communications  of  the  ACM , 

•  High  precision  J  December  1986 

•  «  1  sec.  response 

•  Much  larger  DBs  searchable  with  CM-5 

and  inverted  index  algorithms:  100s  to  1000s  of 
Gigabytes 
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Precision  x  recall 
@  25%  recall 


Average 
performance 
over  1 3 
reference  sets 


mprove  with  Query  Size 


Typical 
Relevance 
Feedback 
Query 


Typical 

Boolean 

Query 


1Q  gQ  go      5Q.  60  70  80  90 

number  of  query  terms 
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jWAIStationi 


WAIStation: 
Active  Database  Sources, 
Saved  Questions 


=|  |=  «nnrrp«   

<•>  CM  applications 

O 

<•>  Encyclopedia 

<•>  King  James  Bible 

<•>  Macintosh  Hard  Disk 

<•>  TMC  Business  email 

<•>  TMC  Library  Catalog 

<9>  yj//Sf.  Jbi#7ki! 

<S>  World  Factbook 

o 

Questions 


?  CM  Apps  Question 

?  Library  question 

?  Encyclopedia  0 

?  Patent  0 

?  TMC  Bus.  Email  0 

?  TMC  Fun  0 

?  Montvale  0 

?  World  Factbook  0 

?  poetry  q 

?  Bible  Q 
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Select  Data  Source 


<1>  CM  applications 
<S>  Encyclopedia 
<S>  King  James  Bible 
<•>  Macintosh  Hard  Disk 
<!>  TMC  Business  email 
<9>  TMCLibraru  r-*-' 

o 

Look  for  documents  about 

o 

Which  are  sim  !'■»»■  to  In  these  sources 

1                                  <•>  ttf/Sf.  Jbornaf 

i 

o 
o 

<S>  World  Factbook 

Results 

o 
o 

a 

1  1  CJ] 
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Run  Initial  Query 


Look  for  documents  about 

recent  developments  in  personal         <>  j  Run 
computers!                                    <>  *  

:) 

Vhich  are  similar  to  In  these  sources 

O     <•>  VaffSt.Jburruf 
O 

Cr 

0 

Results 

g)  ***  Compaq  Computer  Directors  Approve  2-for-1  Stock  Split 
|?|  *+*  International:  Bull  Agrees  to  Pay  Zenith  $15  Million  to  En< 
Hi  ***  AT&T  Set  to  Announce  Memorex  Computer  Accord 
g  ***  Technology  Brief  ~  International  Business  Machines :  Pri< 
l?l  ***  Business  Brief  —  Data  General  Corp . :  Four  Models  Are  Ur 

o 

1  Pi         T**hn*l*au  :  Compute,-  Firm.-       »h*  Vritinq  on  th*  S*r**' 

M?|  ***  Retailing :  Businessland  Enters  Japan,  Aided  by  4  Big  Loca 
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Click  a  Headline  to  Display  a  Document 


Look  for  documents  about 

recent  developments  in  personal 
computers! 

O 
O 

l( «»"  H 

Vhich  are  similar  to  In  these  sources 

<> 

<> 
<> 

Results 

|U  ***  Compaq  Computer  Directors  Approve  2-for-1  Stock  Split  & 
|g|  ***  International :  Bull  Agrees  to  Pay  Zenith  $15  Million  to  Em 
Hi  »»»  ATAVT  Set  to  Announce  Memorex  Computer  Accord 
|=|  ***  Technology  Brief  —  International  Business  Machines :  PriqaS 
Business  Brief  —  Data  General  Corp.:  Four  Models  Are 


Ret 


iDi  Technology:  Computer  Firms  See  the  Writing  i  lEQj 


International  Business  Machines  Corp.,  Apple  Computer  Inc. 
and  other  big  computer  makers  are  staking  out  positions  in 
the  nascent  market  for  "note- pad  computers/'  small  machines 
that  let  users  enter  data  by  writing  rather  than  tapping 
keys.  The  note  pads  typically  recognize  numbers  and  letters 
printed  on  a  screen  with  a  special  pen  and  convert  them  into 
conventional  electronic  characters.  The  information  is  then 
stored  for  later  transfer  to  a  personal  computer  or  a 
company's  main  computers. 

The  size  of  the  market  for  note-  pad  computers  isnt  clear, 
but  Infocorp,  a  Santa  Clara,  Calif.,  market-research  firm, 
estimates  the  market  will  grow  to  3.4  million  units  sold  in 
1 995  from  22,000  units  this  year.  Only  one  company,  Tandy 
Corp.'s  Grid  Systems  unit,  currently  sells  note- pad  computers 
inthe  U.S., -its  model,  introduced  last  September,  is  priced 
at  $3,000.  But  new  ventures  are  expected  to  introduce  several 
note- pad  machines  this  year.  And  already,  big  computer  makers 
are  fighting  quietly  for  control  over  software  standards  for 
these  gadgets,  which  require  different  programs  from  those 
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Relevance  feedback: 
"Find  me  more  like  this  one" 


1  Question-1 

Look  for  documents  about 

recent  developments  in  personal 
computers 

o 
o 

Run 

) 

Which  are  similar  to  In  these  sources 

|?|  Technology :  Coi 

o 
o 

C 
<. 

> 
> 

R  suits 

***  Compaq  Computer  Directors  Approve  2-for-1  Stock  Split  £ 
***  International :  Bull  Agrees  to  Pay  Zenith  $15  Million  to  En<| 
***  AT&T  Set  to  Announce  Memorex  Computer  Accord 
***  Technology  Brief  —  International  Business  Machines :  Pri<  ii. 
#**  Business  Brief  —  Data  General  Corp.:  Four  Models  Are  Ur 

> 

!?1  •  #»  Techrioloqu  C 

mputer  Firm."       the  'v/ritin- 

3  on  th 

El  ***  Retailing: 

Businessland  Enters  Japan,  Aided  by  4  Big  Loca  j- 

> 
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Relevance  Feedback  of  Paragraph 


Technology:  Computer  Firms  See  the  Writing  i 


Computer  makers  are  scrambling  to  cash  in  on  people  who 
find  the  pen  mightier  than  the  keyboard. 

International  Business  Machines  Corp.,  Apple  Computer  Inc. 
<nd  other  big  computer  makers  are  staking  out  positions  in 

ie  nascent  market  for  "note- pad  computers/'  small  machines 

at  let  users  enter  data  by  writing  rather  than  tapping 
i   is.  The  note  pads  typically  recognize  numbers  and  letters 
p   ited  on  a  screen  with  a  special  pen  and  convert  them  i  nto 


estim 
199b 


recent  developments  in  personal 
bomputers  

v/hich  are  similar  to  In  these  sources 


=  Technology  :  Cor  ^ 


Results 
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i 
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IE 
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Compaq  Computer  Directors  Approve  2-for-1  Stock  Split 
International:  Bull  Agrees  to  Pay  Zenith  $15  Million  to  En<|i 
AT&T  Set  to  Announce  Memorex  Computer  Accord 
Technology  Brief  ~  International  Business  Machines :  Pn4P^ 
Business  Brief  ~  Data  General  Corp.:  Four  Models  Are  Un 
Technology  :  Computer  Firms  See  the  Writing  on  the  Scree 
Retailing :  Businessland  Enters  Japan,  Aided  by  4  Big  Loca 
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66 


Chaining"  of  Questions  to  Follow  a  Tangent 


ILJI 


Question-1 


Look  for  documents  about 


recent  developments  in  personal 
computers]   


Run 


Which  are  similar  to  In  these  sources 


=  Technology  :  Coi  ^ 


o 


<•>  1e<3ffSf.  Journal 


Results 


11  **»  International :  Bull  Agrees  to  Pay  Zenith  $15  Million  to  Eni<> 
E|  #**  AT&T  Set  to  Announce  Memorex  Computer  Accord 
[£|  •#•  Technology  Brief  —  International  Business  Machines :  Pric 
ID  *»»  Business  Brief  —  Data  General  Corp. :  Four  Models  Are  Ur 
***  Technology  :  Computer  Firms  See  the  Writing  on  the  Scree 


IS 


Vhich  are  similar  to  In  these  sources 


IU  Retailing :  Busin 


<•>  VjffSt.Jbarnsr 


Results 


Retailing:  Businessland  Enters  Japan,  Aided  by  4  Big  Loca  (}■ 
What's  News  —  Business  and  Finance 
Technology  :  Computer  Makers  Agree  on  a  Standard  For  Ni 
Inside  Track :  Businessland  Directors  Take  a  Loss  And  Tra 
Technology  &  Health :  Businessland  To  Report  Loss  For  3r 
Technology  :  U.S.  Computer  Maker  Takes  on  NEC  on  Its  Ow 
Technology  :  Computer  Firms  See  the  Writing  on  the  Scree 
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TMC  Internet  Release 


•  CM  product  for  TCP/IP  (complete  server) 

•  Example  User  interfaces  for  free  (no  support) 
Macintosh,  Gnu  Emacs,  Xwindows 

•  Example  unix  server  software  to  create  servers 

•  Directory  of  Servers  on  the  internet  at  least 
through  '92 

•  1 60  Servers  now:  Weather  Maps,  patents,  journal 
abstracts,  email  archives,  usenet  recipies,... 

•  Free  Software  via  FTP  from  Think.com:/wais/* 

Mailing  list:  wais-discussion-request@think.com 
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WAIS  Daily  Usages  on  Quake.Think.Com 


Uses 
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Number  of  Clients 
Number  of  Different-hosts 
Number  of  Searches 


Usage  in  1  day 

600  searches  max  on  Quake 
140  searches  ave  on  CM 
18  searches  ave  on  Poetry 
59  different  max  hosts 

Total  usage  of  Quake 
in  2  months 

Different  hosts:  508 
Number  of  Clients:  6729 
Number  of  Searches:  1 2652 
Number  of  Retrievals:  33897 
Total  Transactions:  46549 


Days  since  April  16,  1991 


0.00 


20.00 


40.00 


60.00 


Countries  Using  WAIS: 

Austria,  Canada,  Denmark,  Finland,  France,  Germany,  Holland,  Italy,  Mexico, 

Norway,  Sweden,  Switzerland,  USA  _ 
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WAIS  Uses 


Over  1 0,000  users  on  the  Internet 

Users  in  24  Countries:  Mexico,  Singapore, 
Finland,  Australia,  etc 

160  Databases  served  from  9  Countries: 
Norway,  Canada,  UK,  etc.  Average  3  new 
databases  registered  per  week. 
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WAISUses: 

Campus  Wide  Info  Servers 


•  Class  catalog  and  schedule 

•  Campus  events:  movies,  sports 

•  Job  listings 

•  Library  catalog 

•  Phone  book 

•  Professor  research  interests 

•  Past  theses 


[  sol • acs . unt . edu ] 

[  xantos.uio.no] 
[  next2.oit.unc.edu] 


UNTCompu t e  r Doc 

UiO_Publications 

ibm.pc.FAQ 
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WAIS  Uses:  Biology 


•  Journal  Abstracts 

•  Sequence  archives 

•  Images 

Currently  over  20  Biology  databases  in 
Finland,  Netherlands,  and  US 

cmns.think.com]  Molecular-biology 

bio .vu.nl]  biology-compounds 

genbank . bio . net ]  biology- j ournal -contents 

wais . f unet . f i ]  bionic-ai-researchers 

wais . f unet . f i ]  bionic-directory-of -servers 

wais.funet.fi]  bionic-enzyme 
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WAIS  Uses:  Chemistry 
CORE  Project 

•  All  published  chemistry  (8  years  all  ACS) 

•  Scanned  pictures,  ascii  text 

•  Optical  jukebox  mass  storage 

•  Connection  Machine  /  Newton  search  engines 

Project  of  Bellcore,  ACS,  Chem  Abstracts, 
OCLC,  Cornell,  and  Thinking  Machines 

cu j o . curt in • edu . au]     chem-eng-current-content s 
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WAIS  Uses: 
Business  Executives 

•  Dow  Jones  information 

•  Corporate  information 

•  Personal  information 

Project: 

KPMG,  Apple,  Thinking  Machines,  Dow  Jones 

[  cmns.think.com]     wall-street- journal-sample 

[  think.com]  Business-email 
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WAIS  Uses: 
Medical  Researchers/Doctors 


Medical  papers 

Storing  and  matching  patient  records 
Remote  connections  to  specialized  databases 

[  wais.funet.fi]     bionic-dat abases -limb 
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WAIS  Uses: 
Community  Information 


•  Dial-up  users:  no  network  required 

•  Directories  of  services  or  facilities 

•  Education  and  entertainment 


[  quake.think.com]  internet-resource-guide 

[  sol.acs.unt.edu]  online-libraries 

[  quake.think.com]  weather 

[  lambada.oit.unc.edu]  nsf -bulletins 
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Conclusion 


Electronic  Publishing  can  fill  niches  now. 

Companies  are  positioning  themselves  now 
(workstations,  server,  and  info  providers). 

Thinking  Machines  is  the  "Engine  of  the 
Information  Industry." 
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